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At the cellular scale, biochemical processes are governed by random interactions 
between reactant molecules with small copy counts, leading to behavior that is in¬ 
herently stochastic. Such systems are often modeled as continuous-time Markov 
jump processes that can be described by the Chemical Master Equation. Gille¬ 
spie’s Stochastic Simulation Algorithm (SSA) generates exact trajectories of these 
systems. The amount of computational work required for each step of the original 
SSA is proportional to the number of reaction channels, leading to computational 
complexity that scales linearly as the problem size increases. The original SSA is 
therefore inefficient for large problems, which has prompted the development of sev¬ 
eral alternative formulations with improved scaling properties. We describe an exact 
SSA that uses a table data structure with event time binning to achieve constant 
computational complexity. Optimal algorithm parameters and binning strategies are 
discussed. We compare the computational efficiency of the algorithm to existing 
methods and demonstrate excellent scaling for large problems. This method is well 
suited for generating exact trajectories of large models that can be described by 
the Reaction-Diffusion Master Equation arising from spatially discretized reaction- 
diffusion processes. 
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I. INTRODUCTION 


Traditional differential equation models of chemical systems work well when the interact¬ 
ing molecules are present in high concentrations. However, at cellular scales, biochemical 
reactions occur due to the random interactions of reactant molecules present in small num¬ 
bers. These processes display behavior that cannot be captured by deterministic approaches. 
Under certain simplifying assumptions, including spatial homogeneity, these systems can 
be modeled as continuous-time Markov jump processes. The Chemical Master Equation 
(CME) describes the evolution of the probability of the system being in any state at time 
m The invariant distribution and the evolution can be found for certain classes of reaction 
networks 3 5 . However, the CME is too high dimensional to solve for many realistic models. 
Gillespie’s Stochastic Simulation Algorithm (SSA) generates exact trajectories from the dis¬ 
tribution described by the CME 6 '. Typically, an ensemble of trajectories is run to estimate 
the probability distribution. 

Consider a well-mixed (spatially homogeneous) biochemical system of S different chemi¬ 
cal species with associated populations N(t) = n 2 (t),..., ns(t)). The population is a 

random variable that changes via M elementary reaction channels {R\, i? 2 , Rm}- Each 
reaction channel Rj has an associated stoichiometry vector Uj that describes how the pop¬ 
ulation N changes when reaction Rj “fires”. The SSA (and CME) are derived by assuming 
that each reaction channel is a non-homogeneous Poisson process. The stochastic rate or 
intensity of reaction Rj is determined by a propensity function , denoted a,-, defined ad®® 

aj(N)dt = probability that reaction Rj (1) 

will occur in [t, t + dt). 

The definition in (|TJ) describes a property of an exponential distribution. The time r and 
index j of the next reaction event, given the current state N and time t, can therefore be 
characterized by the joint probability density function 

p(r,j\N,t) = e~^ i = 1<li< ' N ' >T aj(N). (2) 

Equation (J2| combined with the fact that the process is Markovian (a consequence of the 
well-mixed assumption) leads naturally to the SSA. 

We will refer to the (exact) “SSA” as any simulation algorithm that generates trajectories 
by producing exact samples of r and j from the density in (|2]). Note that when other sources 
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describe “the SSA” or “the Gillespie algorithm”, they are often implicitly referring to the 
direct method variant of the SSA (see Section If A) 67 . However, many other SSA variants 
have been proposed that achieve different performance and algorithm scaling properties by 
utilizing alternate data structures to sample the density function (J2]) 9 17 . Many of these 
methods apply random variate generation techniques described in Devroye 18 to the SSA. 
Among the most popular of these alternate formulations of the SSA for large problems is 
the Next Reaction Method (NRM) of Gibson and BruclP. We discuss the NRM and other 
formulations in detail in later sections. Many authors have also described approximate 
methods that sacrifice exactness in exchange for computational efficiency gains, including 
the class of tau-leaping algorithms and various hybrid approached 19 29 . In this paper, we 
restrict our attention to exact methods. 

An important class of problems in which large models arise come from spatially-discretized 
reaction diffusion processes. The assumption underlying the CME and SSA that the reactant 
molecules are spatially homogeneous is often not justified in biological settings. One way to 
relax this assumption is to discretize the system volume into N s subvolumes and assume that 
the molecules are well-mixed within each subvolume. Molecules are allowed to move from one 
subvolume to an adjacent one via diffusive transfer events. This setting can also be described 
as a Markov jump process and leads to the reaction-diffusion master equation (RDME) 30 31 . 
In simulating the RDME, reaction events within each subvolume are simulated with the 
standard SSA and diffusive transfers are modeled as pseudo-first order “reactions”. The 
resulting simulation method is algorithmically equivalent to the SSA (see Gillespie et alP 1 for 
an overview). However, the state space and the number of transition channels grows quickly 
as the number of subvolumes increases. The population of every species must be recorded in 
every subvolume and the number of transition channels includes all diffusive transfer channels 
plus each reaction channel in the homogeneous system is effectively duplicated once for each 
subvolume. In a spatial model, M is the total number of reaction and diffusion channels. 

In this paper we present an exact SSA variant that is highly efficient for large problems. 
The method can be viewed as a variation of the NRM that uses a table data structure in 
which event times are stored in “bins”. By choosing the bin size relative to the average 
simulation step size, one can determine an upper bound on the average amount of compu¬ 
tational work per simulation step, independent of the number of reaction channels. This 
constant-complexity NRM is best suited for models where the fastest timescale stays rela- 
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tively constant throughout the simulation. Optimal performance of the algorithm is sensitive 
to the binning strategy, which we discuss in detail in Section |IV A 


The remainder of the paper is organized as follows. In the next section, we discuss the 


standard direct and NRM methods in more detail. In Section III we demonstrate how 
alternate methods with various scaling properties can be derived by modifying the standard 


methods. Section (IV) presents the constant-complexity NRM algorithm and the optimal 
binning strategy. The performance and scaling of several methods is presented in Section 
[Vj Section VI concludes with a brief summary and discussion. 


II. STANDARD METHODS FOR SPATIALLY HOMOGENEOUS MODELS 

The majority of exact SSA methods are based on variations of the two standard imple¬ 
mentations: the direct method and the NRM. Here we review these standard methods in 
more detail. 


A. Direct Method 

The direct method variant of the SSA is the most popular implementation of the Gillespie 
algorithm^ 7 . In the direct method, the step size r is calculated by generating an exponen¬ 
tially distributed random number with rate parameter (or intensity) equal to the sum of the 
propensities. That is, r is chosen from the probability density function 

f(r) = e“ E ^ iai(A ° T , r > 0. (3) 

The index of the next reaction, j , is an integer-valued random variable chosen from the 
probability mass function 

m= ^M ai (4) 

E i=1 a ii N ) 

The direct method is typically implemented by storing the propensities in a one dimensional 
data structure (e.g. an array in the C programming language). Generating the reaction 
index j as described in Q can be implemented by choosing a continuous uniform random 

number r between 0 and the propensity sum and choosing j such that 

j 

j = smallest integer satisfying ^^aj(iV) > r. (5) 

i—1 
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This “search” to find the index of the next reaction proceeds by iterating over the elements 
in the array until condition ([5]) is satisfied. After the time step size and reaction index are 
selected, the simulation time t and system state N are updated as t = t + r and N = N + Uj. 
Changing the population N will generally lead to changes in some propensities. These 
affected propensities are recalculated by evaluating the propensity functions for the new 
population N and the new propensity values replace the old values in the propensities array 
data structure. The propensity sum is adjusted accordingly. The algorithm repeats until a 
termination condition is satisfied. Typically, an ensemble of many simulation trajectories is 
computed. To begin each simulation, the system time is reset to zero and the state vector 
is set to the initial condition and the algorithm is repeated. 

An analysis of the scaling properties for a single step of the direct method, or any SSA 
variant, can be done by considering the computational work required for the two primary 
tasks of the algorithm: 1) searching for the next reaction index, and 2) updating the reaction 
generator data structure (see footnote 32 ). If the propensities are stored in no particular order 
in the linear array, the search will take approximately M/2 steps on average. If knowledge 
of the average magnitudes of the propensities is available, then the average search can be 
shortened from M /2 to some smaller fraction of M by utilizing static or dynamic sorting, 
which can speed up the simulation 33 31 . In both the sorted and unsorted case, the search 
step has computational cost that scales as O(M). In updating the propensity array data 
structure, the following operations occur for each propensity that must be updated: the 
propensity function is evaluated, the propensity value is updated in the propensity array, and 
the propensity sum is updated. In a typical implementation, these are all 0(1) operations as 
the amount of computational work does not depend on the problem size. The total cost of 
the update step therefore depends on how many propensities must be updated when a given 
reaction occurs. The average cost of the update step will be this number of dependencies for 
each reaction, weighted by the firing frequencies of the reactions. The total cost of each step 
of the direct method is therefore 0(M) for the search for the next reaction plus an update 
cost D , where D is proportional to the average number of propensity updates required per 
simulation time step. We assume that D is bounded above by a constant independent of M, 
making the direct method an 0(M) algorithm overall. The SSA direct method is easy to 
implement with simple data structures that have low overhead, making it a popular choice 
that performs well for small problem sizes. 
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B. Next Reaction Method 


In his original 1976 paper^, Gillespie also proposed the first reaction method, which in¬ 
volves generating the (tentative) next reaction time for every reaction channel and then 
finding the smallest value to determine the next event that occurs. Like the direct method, 
this can be implemented using a simple ID array data structure. The smallest event time is 
then found by iterating over all the elements. The direct method and first reaction method 
both generate exact simulation trajectories and both scale as O(M). However, in practice 
the direct method tends to be more efficient for standard implementations, primarily be¬ 
cause the direct method requires about M/2 operations per step whereas the first reaction 
method requires M operations to fold the smallest element. 

The NRM is a variation of the first reaction method that uses a binary min-heap to store 
the next reaction timed®. A binary min-heap is a complete binary tree in which each node 
contains a value that is less than (or equal to) its children. The search for the next reaction is 
therefore a constant-complexity, or 0(1), operation because the next reaction corresponds to 
the smallest reaction time, which is always located at the top of the heap. However, updating 
the binary min-heap data structure when a propensity changes is computationally expensive. 
For each affected propensity, the propensity function has to be evaluated, the reaction time 
has to be recomputed based on this new propensity, and the binary min-heap must be 
updated to maintain the heap property. Maintaining the heap structure generally requires 
0(log 2 (M)) operations for each affected event time, implying that the total computational 
cost of the update step is DO{log 2 {M )) = 0(log 2 (M )), where D is again the average number 
of propensity updates required per stepf 9 . In practice, the actual update cost will depend 
on many factors, including the extent to which the fastest reaction channels are coupled 
and whether or not the random numbers are “recycled”^ 9 16 . The interested reader should 
consult Gibson and Brack 9 for a detailed analysis. The 0(log 2 (M )) scaling makes the NRM 
superior to the direct method for large problems. 


III. ALTERNATIVE FORMULATIONS 

There are two main techniques for improving the scaling properties of an exact SSA im¬ 
plementation. The first is to use different data structures, as in the binary tree utilized 
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for the NRM min-heap implementation. The other technique is to split the search problem 
into smaller subproblcms by partitioning the reaction channels into subsets. The two tech¬ 
niques are complementary, and in some cases conceptually equivalent. By utilizing different 
combinations of data structures and partitioning schemes, it is possible to define an infinite 
number of alternate SSA formulations with varying performance and scaling properties. 

To fix ideas, we consider a simple variation of the direct method. Instead of storing all 
M propensities in a single array, one could partition the reaction channels into two subsets 
of size M/2 and store them using two arrays. If we label the propensity sums of the two 
subsets as ] and as 2 , respectively, then the probability that the next reaction will come from 
subset i is as i /(as 1 + cts 2 )- Conditioned on the next reaction coming from subset i, the 
probability that the next reaction is Rj will be a 3 / a s %1 for all j in subset i. These statistical 
facts follow from the properties of exponential random variables and lead naturally to a 
simulation algorithm. We first select the subset i from which the next reaction will occur. 
To do this, we choose a continuous uniform random number rq between 0 and the propensity 
sum and then chose i such that 

i 

i = smallest integer satisfying (6) 

k=1 

The similarity to (J5| is apparent as we are essentially performing the direct method’s linear 
search but applied to subset propensity sums instead of propensities. We have written (J6]) 
in a generic way to accommodate more than two subsets, which we will consider in the next 
subsection. Once we have determined the subset i from which the next reaction will occur, 
we can use the direct method’s linear search from (|5| again to select the reaction j from 
within subset i, but choosing the uniform random number between 0 and as t and iterating 
over only the elements in the array corresponding to subset i. 

The resulting algorithm can be viewed as a “2D” version of the direct method. The 
algorithm requires, on average, a search of depth 1.5 to choose the subset, assuming the 
reaction channels were partitioned in no particular order. The search within the subset 
will then require an average search depth of M/4 to select the particular reaction within 
the chosen subset. By partitioning the reaction channels into two subsets, we have derived 
an algorithm that is statistically equivalent to the direct method but with different scaling 
properties. The new method has slightly more overhead than the original direct method 
because we must update the subset propensity sums at each step. It will therefore be less 
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efficient than the original direct method for small problems. However, the M/4 scaling 
will outperform the M/2 scaling of the direct method for larger problems. The algorithmic 
complexity of this variation is still O(M), so the NRM will also outperform this method for 
sufficiently large problems. Below we show how more sophisticated partitioning and data 
structures can be used to achieve improved scaling properties. 


A. Alternate Direct Formulations 

One can expand on the idea above to create other direct method variants. If each of 
the two subsets is then split into two more subsets, a similar procedure can be applied to 
derive a method with M/8 scaling. If this processes is repeated recursively, the resulting 
partitioning of the reaction channels can be viewed as a binary tree 9 1211 . This logarithmic 
direct method has 0(/oc/ 2 (M)) cost for the search and update steps. If instead of partitioning 
into two subsets, the reaction channels were partitioned into \[M subsets each containing 
a/M channels, the search for the subset and the search within the subset are both 0(a/M) 
operations, leading to 0(\/M) algorithmic complexity. This is the theoretically optimal “2D 
search” formulation. Similarly, one can define a three-dimensional data structure, leading 
to a “3D search” which scales as 0(-\/M), and so onf 12 17 . 

Slepoy et al. 10 proposed a constant-complexity (0(1)) algorithm that uses a clever par¬ 
titioning strategy combined with rejection sampling. First, the reaction channels are parti¬ 
tioned based on the magnitude of their propensities with partition boundaries corresponding 
to propensities equal to powers of two. That is, if we number the partitions using (positive 
and negative) integers, partition i contains all of the reaction channels with propensities 
in the range [2*,2* +1 ). The particular partition g from which the next reaction will occur 
is chosen exactly via a linear search. The average search depth to select the subset will 
depend on the range between the largest and smallest nonzero propensity value. This search 
is assumed to have a small average search depth, independent of M. Once the subset g 
from which the next reaction will occur is determined, a rejection sampling technique is 
employed to select the particular channel within the subset. An integer random number 
r\ is chosen between [1 ,M g \, where M g is the number of reaction channels in the chosen 
subset. This tentatively selects the reaction channel j corresponding to the rf 1 channel in 
the subset. Then a continuous random number r 2 is chosen between [0, 2 9+1 ). If r 2 < cij(x), 






reaction channel j is accepted as the next event, otherwise it is rejected and new random 
numbers rq and r 2 are chosen. The process is repeated until a reaction is accepted. This 
procedure selects the reaction according to the exact probability density function. Since 
the subsets have been engineered such that all propensities in subset g are within the range 
[2 9 ,2 9+1 ), on average less than half of the selected reactions will be rejected. Therefore, 
it takes fewer than two samples on average to select the next reaction, independent of the 
number of reaction channels. Updating the data structure for each affected propensity is also 
an amortized constant-complexity operation that requires, in the worst case, removing an 
element from one partition and inserting it into a different partition, which occurs whenever 
the change in propensity crosses a power of two boundary. The search and update steps are 
both constant-complexity, leading to 0(1) algorithmic complexity independent of M. 


B. Next Subvolume Method 

The preceding methods are general formulations that can be applied to any model that can 
be described by the CME. In this subsection we describe the next subvolume method (NSM, 
not to be confused with the NRM), which is formulated specifically for simulating processes 
described by the RDME. The NSM is a variation of a 3D search method 35 that partitions 
the channels (reaction and diffusion) based on the spatial structure of the model. The NSM 
has several desirable properties that make it efficient, and hence popular, for simulating 
spatial models. The NSM first partitions on subvolumes and uses the NRM (implemented 
using a binary min-heap) to select the subvolume in which the next event occurs. Within 
each subvolume, the diffusion channels and reaction channels are stored using two arrays, 
similar to the two-partition direct method scheme from the beginning of this section. To 
choose the particular event within the subvolume, the NSM first does a linear search to 
determine whether the next reaction is diffusion or a reaction, then chooses the particular 
event channel via a linear search within that partition. Among the favorable properties of 
the NSM is that organizing the events by subvolume tends to keep the updates local in 
memory. Partitioning by subvolume ensures that an event affects at most two subvolumes 
(which occurs when the event is a diffusive transfer), therefore, at most two values in the 
binary min-heap need to be updated on each step of the algorithm. The NSM scales as 
0(log2(N s )), where N s is the number of subvolumes. In practice, the NSM performs well on 
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spatial models spanning a wide range of problem sizes. 

Recently, Hn et al. presented a method in which the NSM search is effectively reversed—- 1 . 
First, the type of event is selected, then the subvolnme is chosen using a binary, 2D, or 
3D search, leading to algorithmic complexity that is 0(log2(N s ), 0(y/N^), or O(tfNl), 
respectively. 


IV. CONSTANT-COMPLEXITY NEXT REACTION METHOD 

The NRM was derived by taking the basic idea of the first reaction method and utilizing 
a different data structure to locate the smallest event time. The abstract data type for this 
situation is known as a priority queue and in principle any correct priority queue implemen¬ 
tation can be used. Here we present an implementation of the next reaction method that uses 
a table data structure comprised of “bins” to store the event times for the priority queue. If 
the propensity sum, and therefore the expected simulation step size, can be bounded above 
and below, then the average number of operations to select the next reaction and update the 
priority queue will be bounded above, independent of M, resulting in a constant-complexity 
NRM. 

To implement a constant-complexity NRM, we partition the total simulation time inter¬ 
val, from the initial time t = 0 to final time t = Tf, into a table of K bins of width W. For 
now, we will let WK = Tf. Bin Bi will contain all event times in the range [iW, (i + 1 )W). 
We generate putative event times for all reaction channels as in the original NRM, but insert 
them into the appropriate bin in the table instead of in a binary min-heap. Events that fall 
outside of the table range are not stored in the table. Values within a bin are stored in a 
ID array (though alternative data structures could be used). To select the next reaction, 
we must find the smallest element. Therefore, we must find the first nonempty bin (i.e. 
the smallest i such that Bi is not empty). To locate the first non-empty bin, we begin 
by considering the bin i from which the previous event was found. If that bin is empty, 
we repeatedly increment i and check the i th bin until a nonempty bin is found. We then 
iterate over the elements within that bin to locate the smallest value. The update step of 
the algorithm requires computing the new propensity value and event time for each affected 
propensity. In the worst case, the new event time will cause the event to move from one bin 
to another, which is an 0(1) operation. Therefore, the update step will be an 0(D) oper- 
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FIG. 1. Priority queue using binning. The next reaction in a simulation is the event with the 
smallest event time. Locating the smallest event time requires first locating the smallest non¬ 
empty bin, then finding the smallest element within that bin. Elements in the bins contain the 
event time and the event (reaction) index. Bin boundaries are integers here for simplicity. Elements 
within a bin are unsorted. Elements with zero propensity or with an event time beyond the last 
bin are not stored in the queue. In the top figure, the first bin, Bq, contains the smallest element 
as indicated by the arrow. After that event occurs, the search for the next nonempty bin begins 
at the current bin and, if empty, the bin pointer is incremented until the next nonempty bin is 
located, as indicated by the arrow under bin B 2 in the bottom figure. Here the next event is “-R 7 ”. 
In a real simulation, the update step may cause elements to move from one bin to another. 


ation, where again D is the average number of propensity updates required per simulation 
step. We assume that D is bounded above independent of M. Therefore, if the propensity 
sum is bounded above and below independent of M, the overall complexity of the algorithm 
is 0(1). In the next section we consider the optimal bin width to minimize the cost to select 
the next reaction. 
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A. Optimal Bin Width 


Storing the elements within a bin using a ID array is similar to the chaining approach 
(sometimes referred to as “closed addressing”) to collision resolution in a hash table. A 
hash table is a data structure in which values are mapped to bins using a hash function. 
Since there are generally more possible values than total bins, collisions, where multiple 
values are mapped to the same bin, are possible. Chaining is a collision resolution strategy 
where multiple values that map to the same bin are stored in a linked list. A well-designed 
hash table typically has amortized constant time insertion and retrieval. (Interested readers 
should consult an introductory computer science data structures textbook.) An important 
difference between our data structure and a hash table implementation is that hash tables 
are typically designed to have a particular load factor. The load factor is defined as the 
number of elements stored in the table divided by the number of bins. For a hash table with 
a good hash function, a low load factor ensures that each bin will contain a small number 
of elements, on average. However, we are effectively using the reaction event times as a 
hash function. Whereas a good hash function distributes elements uniformly amongst the 
bins, the reaction times are not uniformly distributed, they are exponentially distributed. 
Targeting a particular load factor could lead to good or bad performance depending on the 
distribution of the propensities, because the key to efficiency is choosing the appropriate bin 
width relative to the mean simulation step size. If the bin width is too small, there will be 
many empty bins and if the bin width is too large there will be many elements within each 
bin. 

In considering the search cost and optimal bin width, it is helpful to consider two extreme 
cases. For the first case, suppose one reaction channel is “fast” and the rest are slow. For the 
second case, suppose all reaction channels have equal rates (propensities). For simplicity of 
analysis we can rescale time so that the propensity sum equals one and we assume Tf 3> 1. 
Then in the first case, if the propensity of the fast channel is much larger than the sum of 
the slow propensities (i.e. the fast propensity is approximately equal to one), we can choose 
the bin width to be large on the scale of the fast reaction but small on the scale of the slow 
reactions. For example, choosing W ~ 6.64 (corresponding to the 99 th percentile for a unit 
rate exponential distribution) ensures that the fast reaction will initially be in the first bin 
with approximately 99% probability. By assumption, there is a small probability that any 
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of the slow reactions will appear in the first bin. Upon executing the simulation, the fast 
reaction will repeatedly appear in the first bin and be selected during the next step, until it 
eventually appears in the second bin (with probability < 1% of landing beyond the second 
bin). If it takes on average roughly 6.6 steps before the fast reaction appears in the second 
bin, then the average search depth to locate the first nonempty bin is about 1 + /g « 1.15 
and the average search depth within a bin is approximately one. The “total search depth” 
will be approximately 2.15. The slow reactions contribute a relatively small additional cost 
in this scenario. If, however, the slow reactions are not negligible, then the fast reaction 
plays a less important role in the search cost and the situation can be viewed as similar to 
the second case, which we consider next. 

Here we suppose that all reaction channels have equal propensities and the propensity 
sum equals one. In this case, the number of elements that will initially be placed in the first 
bin will be approximately Poisson distributed with mean W. As the simulation progresses, 
elements will be removed from the first bin until it is emptied and the simulation will move 
on to the second bin where the process repeats. If the number of events per bin is Poisson 
distributed with rate W, the average search depth to locate the first nonempty bin is 1/PU+l 
and the average search depth within a bin is W/2 + 1. The total search depth is minimized 
when W = \/2, leading to a total search depth of 2 +a/ 2 ~ 3.41. That is, it takes an average 
of about 3.41 operations to select the next reaction, independent of the size of the model. 
If the propensity sum does not equal one, this minimum total search depth will be achieved 
with a bin width of W = \/2 )/))/, a/a/. 

The theoretical optimal relative bin width W = y/2 does not minimize the search cost 
in an actual implementation. Figure [2] shows that the search cost is minimized at a bin 
width much larger than y/2. One reason for this is that accessing consecutive items within 
a bin is generally faster than traversing between bins because items within a bin are stored 
in contiguous blocks of memory. In our experience, a bin width of approximately 16 times 
the mean simulation step size performs well across a wide range of problem sizes. Widths 
between 8 and 32 times the step size perform well, making the choice of 16 robust to modest 
changes in the propensity sum. However, it is possible that the optimal values may vary 
slightly depending on the system architecture. More importantly, if the propensity sum 
varies over many orders of magnitude during a simulation, a static bin width may be far 
from optimal during portions of the simulation. 
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FIG. 2. Search cost for various bin widths. In this example, M = 10' and every propensity = ICG' 
(the propensity sum equals one). The theoretical optimal bin width for minimizing the total search 
depth corresponding to W = \/2 is shown as a dashed vertical line. In practice, the true optimal 
bin width is larger than W = y/2. A bin width of 16 times the mean simulation step size performs 
well over a wide range of problem sizes. Search cost is measured in seconds for 10 7 simulation 
steps. 

B. Optimal Number of Bins and Dynamic Rebuilding 

It is not necessary to choose the number of bins K such that WK > Tf, where again 
Tf is the simulation end time. Clearly, choosing K such that W(K — 1) > Tf means that 
the table is larger than necessary, which is inefficient as larger memory use leads to slower 
simulations. However, it is less obvious that a choice of K such that WK < Tf can lead 
to improved performance. When WK < Tf, the table data structure must be rebuilt when 
the simulation time exceeds WK. A tradeoff exists between choosing K large, which is less 
efficient because it uses more memory, and choosing K small, which requires more frequent 
rebuilding of the table data structure. The propensity and reaction time “update step” also 
benefits slightly from a smaller table because fewer reaction channels will be stored in the 
table leading to fewer operations required to move reactions from one bin to another. Figure 
[3] shows the elapsed time to execute simulations for varying bin widths and numbers of bins. 

As the problem size increases, the optimal number of bins gets larger due to the increased 
rebuild cost. We have found that the optimal number of bins scales roughly proportional to 
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FIG. 3. Simulation time for various bin widths and number of bins. In this example, M = 10' 
and every propensity = 10~ T (the propensity sum equals one). The dashed line corresponds to 
the theoretical optimal bin width W = y/2 that minimizes the total search depth. The solid line 
corresponds to IF = 16, which is the target bin width used in practice. The algorithm performs 
well over a fairly wide range of bin widths and number of bins. As the problem size increases, the 
optimal number of bins increases roughly proportional to \[M. 

the square root of the number of reaction channels. In practice, choosing K = 20 \[M leads 
to good performance across a wide range of problem sizes, though the optimal value may 
vary across different system architectures. In the case where many of the reaction channels 
have zero propensity, it is more efficient to use the average number of nonzero propensity 
channels instead of M in computing the number of bins. To facilitate rebuilding the table, 
we record the number of steps since the last rebuild. This allows for the bin width to be 
chosen adaptively based on the simulation step sizes used most recently. This adaptive bin 
sizing strategy partially mitigates the problem of suboptimal bin widths that may arise due 
to changing propensity sums. Overall, the constant-complexity NRM algorithm with a fixed 
target relative bin width and dynamic rebuilding strategy exhibits excellent efficiency across 
a wide range of problem sizes as demonstrated in the next section. 

V. NUMERICAL EXPERIMENTS 

In this section we demonstrate the performance and scaling of the constant-complexity 
NRM (“NRM (constant)”) relative to other popular methods. Among the other methods 
considered are the constant-complexity direct method (“Direct (constant)” composition- 
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rejection algorithm) of Slepoy et a/P^, the original NRM (“NRM (binary)”) of Gibson and 
Brack®, and the NSM of Elf and Ehrenberg 35 . The algorithms were implemented in C++, 
using code from StochKit2 36 and Cain! 37 -! where possible. Pseudocode that outlines the 
constant-complexity NRM is given in Appendix A. All timing experiments were conducted 
on a Macbook Pro laptop with a 2.4 GHz Core i5 processor and 8 GB of memory. 


A. Reaction Generator Test 

Most exact SSA variants can be viewed as either a Direct Method or NRM implementa¬ 
tion with varying data structures used to select the next reaction. The performance of the 
reaction generator data structure is the primary determinant of the overall algorithm per¬ 
formance. In this section we test the efficiency of several reaction generator data structures, 
independent of the rest of the solver code, by simulating the “arrivals” of a network of M 
Poisson processes. 

As shown in Figure [4], methods utilizing simple data structures with low overhead per¬ 
form best on small to moderate sized problems. The example in Figure [4] used a random 
network model of unit-rate Poisson processes with a relatively high degree of connectivity 
(10 updates required for each step; note that the data structure updates were performed 
as if the propensities were changing, even though they were always set to unit rates.). The 
original NRM, implemented with a binary min-heap, would perform better relative to the 
others if fewer updates were required at each step. The constant-complexity NRM exhibits 
small timing fluctuations due to the method being tuned for much larger problems. In Fig¬ 
ure [5j we see that the constant-complexity direct method and constant-complexity NRM 
method outperform the others on large problems, with the constant NRM performing best. 
However, we see that the 0(1) scaling does not appear constant across large problem sizes. 
This is due to the effects of using progressively larger amounts of memory. Running the 
same experiments on a different system architecture could lead to differences in crossing 
points between methods, but the overall trends should be similar. 
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FIG. 4. Scaling on small problems. Elapsed time in seconds to generate the reaction index and 
update the data structure 10' times for various reaction generators. Each reaction channel has 
a unit rate propensity and a random network in which 10 updates are required was generated. 
For extremely small models, where M < 100 or so, the original direct method with linear search 
performs best. As the problem size increases, the direct method with a 3D search is optimal. Not 
shown is the direct method with 2D search, which slightly outperforms 3D search when M < 5000. 
The constant-complexity NRM performance exhibits some fluctuations because the implementation 
was not optimized for small problems. 


B. 3D Spatial Model 


The next subvolume method is a popular method for simulating spatial models. The 
NSM is different from the other methods considered here in that information about the 
spatial structure is built in to the algorithm. Therefore, it does not make sense to test the 
NSM reaction generator independent of a spatial model and full solver implementation. 


Here we compare the NSM, the constant-complexity direct method, and the constant- 
complexity NRM on a model using a 3D geometry comprised of equal sized cubic subvolumes 
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Method 

NRM (constant) 
Direct (constant) 
NRM (binary) 

Direct (3D search) 
Direct (linear search) 



FIG. 5. Scaling on large problems. Under the same conditions as Figure [4] with larger problem 
sizes, the constant-complexity methods outperform the others. Although the 0(1) algorithms scale 
roughly constant across a wide range of moderate problem sizes, as the problem size becomes large, 
the increased memory demands lead to imperfect scaling. 

with periodic boundary conditions. The reactions and parameters are shown below 35 . 

E A ^E A +A 

E b —b E b + B 

E a + B ^ E a B 

k d 

e a b + b^ e a b 2 

kd 

e b + a ^ e b a 

kd 

E b A + A ^ E b A 2 

kd 



k\ = 150 s -1 , k a = 46.2 (/rM) _1 s _1 , 
k d = 3.82 s -1 , = 6 s' 1 

D = 10~ 8 cm 2 s _1 

[E a ] (0) = [£ fl ](0) = 12.3 nM. 
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The diffusion constant D is equal for all species. The rate constant for all diffusive transfer 
events is therefore D /Z 2 , where l is the subvolume side length 35 . We note that the first two 
reaction channels are a common motif used to model processes such as protein production 
for an activated gene or, in this case, enzyme-catalyzed product formation in the presence of 
excess substrate, where the rate constant k\ implicitly accounts for the effectively constant 
substrate population. 

It is possible to scale up the number of transition (reaction and diffusion) channels by 
changing the system volume or changing the subvolume side length. First, we consider 
a large volume, with domain side length 12pm and subvolume side lengths ranging from 
0 .6pm to 0.2p, corresponding to a range of 8000 to 216000 subvolumes, respectively (see 
Fig. NIC in the Supplementary Material of Elf and Ehrenber^- 35 l). As shown in Figure [6j 
the constant-complexity NRM outperforms the NSM and constant-complexity direct method 
for problems larger than M = 480000, which corresponds to meshes finer than 20 3 = 8000 
subvolumes. 


500-" 



100 - 


0 - 


480000 3840000 7500000 12960000 

Total Channels 


NRM (constant) 
Direct (constant) 
NSM 


FIG. 6. 3D spatial model with system volume V = 12 3 pm 3 . Plotted is the elapsed time in seconds 
to execute 10 8 simulation steps. Below M = 480000 channels, the NSM is more efficient than 
the constant-complexity methods. Above M = 480000 channels, the constant-complexity NRM is 
about 20% faster than the constant-complexity direct method. 

We next consider the same 3D model with system volume V = 6 3 pm 3 . As shown in 
Figure [7J the constant-complexity NRM still achieves a benefit over the NRM and constant- 
complexity direct method, albeit a smaller improvement. In this example, there are approx¬ 
imately 28000 molecules in the system after the initial transient. At the finest resolution in 
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Figure [7] there are 216000 subvolumes, of which most contain no molecules. Therefore, the 
majority of the reaction channels are effectively switched off, or inactive, with propensity 
zero. In this example at the finest resolution, typically fewer than 2 x 10 5 of the nearly 
1.3 x 10' channels have nonzero propensities. The constant-complexity direct method and 
the NSM exclude zero propensity reactions from their reaction selection data structures, 
effectively reducing the problem size to the number of nonzero channels. The constant- 
complexity NRM does not benefit much from having many zero propensity channels. 

Changing the spatial discretization influences the rates of 0 th order and bimolccular reac¬ 
tions and diffusion events. For instance, using a finer discretization increases the frequency 
of diffusion events. This means that more simulation steps are required to reach a fixed sim¬ 
ulation end time. Changing the relative frequencies of different reaction channels influences 
simulation performance, though the effect is typically small for all methods. In the NSM, 
for example, increasing the relative frequency of diffusion events will improve performance 
slightly if there are fewer diffusion directions (e.g four for a 2D Cartesian mesh) than reac¬ 
tion channels for each subvolume because the average “search depth” will be weighted more 
heavily toward the smaller diffusion event search. 

500 

400 

1300 

F 

■o 

<1) 

g-200 

LU 

100 

0 

480000 3840000 7500000 12960000 

Total Channels 




• A 


NRM (constant) 
Direct (constant) 
NSM 


FIG. 7. 3D spatial model with system volume V = 6 3 //m 3 . At this smaller system volume, when 
the mesh is highly refined, most subvolumes are empty. Hence many of the reaction channels have 
zero propensity. The constant-complexity direct method and the NSM exclude all events with 
propensity zero from their data structures. The constant-complexity NRM uses the number of 
active (nonzero propensity) reactions to determine the number of bins K to use, but this algorithm 
does not benefit much from having many zero-propensity reaction channels. 
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VI. DISCUSSION AND CONCLUSIONS 


We have shown it is possible to formulate a constant-complexity version of the NRM 
that is efficient for exact simulation of large discrete stochastic models by using event time 
binning. Rather than targeting a load factor as one would with a hash table, the key 
to consistent efficiency of the algorithm is to choose the bin width based on the average 
propensity sum (and, therefore, step size) of the simulation. The examples in Section [V] 
demonstrate the advantages and some of the disadvantages of the constant-complexity NRM. 
The algorithm is not well suited for small models. However, for models with a large number 
of active channels and timescales that do not vary too rapidly, the constant-complexity NRM 
is often more efficient than other popular methods. 

For models with many inactive (zero propensity) channels, the performance of some SSA 
variants depends on the number of active channels rather than the total number of channels. 
The NSM scales proportional to the logarithm of the number of subvolumes containing active 
channels. The constant-complexity direct and NRM methods scale 0(1) in algorithmic 
complexity, but their performance does depend on the amount of memory used. Both 
constant-complexity methods have memory requirements for their reaction generator data 
structures that scale roughly proportional to the number of active channels. However, the 
table rebuild step in the constant-complexity NRM method scales as O(M). This typically 
constitutes a small fraction of the total computational cost (e.g. < 3% for the largest 

problem in Figure [5]). However, in the case of extremely large M and an extremely small 
number of active channels, the relative cost of rebuilding the table in the constant-complexity 
NRM becomes more significant. In an extreme case, other methods such as the constant- 
complexity direct method, NSM, and original NRM may be more efficient. 

It may be possible to modify the constant-complexity NRM to make it less sensitive to 
changes in the average simulation step size and number of active channels. The current 
dynamic table rebuilding strategy handles this well in many cases. However, in the case of 
extreme changes one could implement a “trigger” that initiates a table rebuild if changes 
in the timescale or number of active channels exceeds a threshold. One could also envision 
utilizing step size data from previous realizations to guide the binning strategy, possibly 
even utilizing unequal bin sizes, to further improve performance. 

Performance comparisons are inherently implementation, model, and system architecture 
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dependent. While we have attempted to present a fair comparison and the general algorithm 
analysis is universal, the exact simulation elapsed times may vary in different applications. 
The constant-complexity NRM presented here is an efficient method in many situations but 
inappropriate in others. As modelers develop larger and more complex models and as spatial 
models become more common, this algorithm provides a valuable exact option among the 
large family of exact and approximate stochastic simulation algorithms. 
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Appendix A: Algorithm Pseudocode 

The following pseudocode is representative of an implementation of this algorithm 
method. 

Model: propensities, u, dependencyGraph 
DataStructure: table, lowerBound, binWidth, bins 
procedure NRM(xO, tFinal) 
t i — 0 

x rrO 

buildDataStructureQ 
while t < tFinal do 

event selectReactionQ 
t <(— event.time 
x <— x + v(event.index) 
updateD ata Structure (event .index) 

% store output as desired 
end while 
end procedure 
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procedure SELECTREACTION 

% return min event time and index 
% first, locate bin index of smallest event 
while table (min Bin) .is Empty () do 
minBin <— minBin + 1 
if minBin > bins then 
buildDataStructureQ 
end if 
end while 

% smallest event time is in table(minBin) 

% find and return smallest event time and index 
return min(table(minBin )) 

end procedure 

procedure buildDataStructure 
lower Bound •(— t 

% default 20*sqrt(ACTIVE channels) 
bins 20 * sqrt(propensities .size) 

% default 16*step size 
% in practice, an approximation to 
% snm(propensities) is used 
binWidth 16 /sum(propensities) 

for i —V.propensities.size do 
rate propensities(i) 

r 4— exponential (rate) 
eventTime(i) t + r 
table .insert (i, eventTime (i)) 
end for 
minBin 0 
end procedure 

procedure updateDataStructure( index) 
for i in dependencyGraph(index) do 
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oldTime 4— eventTime (index) 

oldBin ComputeBinlndex(oldTime) 

rate 4— propensities(i) 

r 4 — exponential (rate) 

eventTime(i) t + r 

bin 4— ComputeBinlndex(eventTime(i)) 

if bin oldBin then 

table (oldBin) :remove(i) 
table.insert(i , eventTime(i j) 
end if 
end for 

end procedure 

procedure TABLE.INSERT(i, time) 
bin <— computeBinlndex(time) 

% insert into array 
table(bin).insert(i, time) 

end procedure 

procedure COMPUTEBlNlNDEx(time) 
offset •(— time — lower Bound, 
range lower Bound, * binWidth * bins 

bin 4 — integer (of f set/range * bins) 

return bin 

end procedure 
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